Building Decision Trees on Records Linked through Key References
نویسندگان
چکیده
We consider the classification problem where the data is given by a collection of tables related by a hierarchical structure of key references and class labels contained in the root table. Each parent table represents a many-to-many relationship type among its child tables. Such data are frequently found in relational databases, data warehouses, XML data, and biological databases. One solution is joining all tables into a universal table based on the recorded relationships, but it suffers from a significant blowup caused by many-to-many relationships. Another solution is treating the problem as relational learning, at the cost of increased complexity and degraded performance. We propose a novel method that builds exactly the same decision tree classifier as built from the joined table, but not the blowup required in the traditional approach.
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملImproving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
Analyzing given medical databases provide valuable references for classifying other patients symptoms. This study presents a strategy for discovering fuzzy decision trees from medical databases, in particular Harbeman’s Survival database and the Blood Transfusion Service Center database. Harbeman’s Survival database helps doctors treat and diagnose a group of patients who show similar past medi...
متن کاملKnowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees
Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a data set, and to predict the class attribute value of an unlabeled record. Sometimes a set of decision trees, rather than just a single tree, is also generated from a data set. A set o...
متن کاملA Survey of Information Theory Application on Data Mining
In data mining area, "classification" is one of the most important isses. The approach of decision trees generated is a very useful and reliable solution. For the construction of a decision tree, there are several ways. Among them, Information Theory is a very effective and scalable method. This is a survey project in Information Theory. We focus on the generation of decision tree for classific...
متن کاملFeature Generation using Ontologies during Induction of Decision Trees on Linked Data
Linked data has the potential of interconnecting data from different domains, bringing new potentials to machine agents to provide better services for web users. The ever increasing amount of linked data in government open data, social linked data, linked medical and patients’ data provides new opportunities for data mining and machine learning. Both are however strongly dependent on the select...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005